-
-
Notifications
You must be signed in to change notification settings - Fork 227
Fix deserialization of unions #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dbanty I opened a new PR, since the git history and merge conflicts with the template renaming (adding |
Codecov Report
@@ Coverage Diff @@
## main #332 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 47 47
Lines 1410 1420 +10
=========================================
+ Hits 1410 1420 +10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice to see the None/Unset type string refactor make it back in! I have a few suggestions and Codecov seems to be saying we're missing a unit test somewhere.
not_required_one_of_models = UNSET | ||
if not isinstance(self.not_required_one_of_models, Unset): | ||
not_required_one_of_models = self.not_required_one_of_models.to_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have to check later why this is the case. I assume there's some difficulty with handling the type in the template which is why we're checking multiple isinstances here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the OpenAPI:
"not_required_one_of_models": {
"oneOf": [
{
"ref": "#components/schemas/FreeFormModel"
},
{
"ref": "#components/schemas/ModelWithUnionProperty"
}
],
"nullable": false
},
My guess is that the inner properties are not required? Kind of confusing what the semantics should be...
if not isinstance(data, str): | ||
raise TypeError() | ||
a_camel_date_time = isoparse(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think letting isoparse
raise the error (if we're going to raise it anyway) would be better because it would give more context as to what is wrong. It's especially confusing to readers to see this since the input type of the function is str
, so why would this not be one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, this is here to get the correct Union option more easily? Maybe then we need to exclude it if no Union? Not really necessary for this PR, there are plenty of things like this that need cleanup at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, this is just a special case when there's a union of only 1 element. I think it's pretty unlikely to happen in actual usage. Let me try to fix it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out to be unnecessary if we're changing the input type to be object
.
if not isinstance(data, dict): | ||
raise TypeError() | ||
one_of_models = FreeFormModel.from_dict(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re: confusion. If we can't be confident in the data passed in, the type annotation should be : Any
and we do the runtime check. If we are confident of what's passed in, we can do the correct type annotation and not a runtime check.
I think the only reason this would be the wrong data type is if the OpenAPI spec was wrong or someone made a breaking change to the API. Either way we can't properly handle all the cases that could fail. So I think just the proper type annotation without the runtime check is probably the better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. Actually initially I didn't have a runtime check, but then MyPy throws an error, because it's not type-safe.
I'm starting to think that maybe the original Any
annotation is better, since we can't be confident in the data passed in. But I think it should be object
, not Any
, because Any
causes type errors to be ignored, whereas object
will force us to be type-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good regarding object
vs Any
if that ends up working better.
nullable_model = None | ||
_nullable_model = d.pop("nullable_model") | ||
if _nullable_model is not None: | ||
nullable_model = AModelNullableModel.from_dict(cast(Dict[str, Any], _nullable_model)) | ||
nullable_model = AModelNullableModel.from_dict(_nullable_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised mypy
doesn't get mad about us passing an Any
to a function which takes Dict[str, Any]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's because Any
has weird semantics, in that it basically just means "ignore any type errors". If we were to change the signature to object
, then it would error.
def _get_inner_type_strings(self, json: bool = False) -> List[str]: | ||
inner_types = [p.get_type_string(no_optional=True, json=json) for p in self.inner_properties] | ||
unique_inner_types = list(dict.fromkeys(inner_types)) | ||
return unique_inner_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _get_inner_type_strings(self, json: bool = False) -> List[str]: | |
inner_types = [p.get_type_string(no_optional=True, json=json) for p in self.inner_properties] | |
unique_inner_types = list(dict.fromkeys(inner_types)) | |
return unique_inner_types | |
def _get_inner_type_strings(self, json: bool = False) -> Set[str]: | |
return {p.get_type_string(no_optional=True, json=json) for p in self.inner_properties} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I tried this, but IIRC it was causing the codegen to be non-deterministic 😬 (or at least unpredictable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We could, when actually joining them (I think that's the only place they are used for output?) add a sorted
to make it deterministic so it passes e2e tests?
def get_type_strings_in_union(self, no_optional: bool = False, json: bool = False) -> List[str]: | ||
type_strings = self._get_inner_type_strings(json=json) | ||
|
||
if no_optional or (self.required and not self.nullable): | ||
return type_strings | ||
elif self.required and self.nullable: | ||
return ["None"] + type_strings | ||
elif not self.required and self.nullable: | ||
return ["Unset", "None"] + type_strings | ||
else: | ||
return ["Unset"] + type_strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_type_strings_in_union(self, no_optional: bool = False, json: bool = False) -> List[str]: | |
type_strings = self._get_inner_type_strings(json=json) | |
if no_optional or (self.required and not self.nullable): | |
return type_strings | |
elif self.required and self.nullable: | |
return ["None"] + type_strings | |
elif not self.required and self.nullable: | |
return ["Unset", "None"] + type_strings | |
else: | |
return ["Unset"] + type_strings | |
def get_type_strings_in_union(self, no_optional: bool = False, json: bool = False) -> Set[str]: | |
type_strings = self._get_inner_type_strings(json=json) | |
if no_optional: | |
return type_strings | |
if self.nullable: | |
type_strings.add("None") | |
if not self.required: | |
type_strings.add("Unset") | |
return type_strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same as above, concern about non-determinism)
{{ property.python_name }} = {{ property.reference.class_name }}(_{{ property.python_name }}) | ||
{% endif %} | ||
{% endmacro %} | ||
|
||
{% macro check_type_for_construct(property, source) %}{% if property.value_type == str %}isinstance({{ source }}, str){% else %}isinstance({{ source }}, int){% endif %}{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{% macro check_type_for_construct(property, source) %}{% if property.value_type == str %}isinstance({{ source }}, str){% else %}isinstance({{ source }}, int){% endif %}{% endmacro %} | |
{% macro check_type_for_construct(property, source) %}isinstance({{ source }}, {{ property.value_type }}){% endmacro %} |
@@ -11,6 +11,9 @@ class Unset: | |||
|
|||
UNSET: Unset = Unset() | |||
|
|||
# Used as `FileProperty._json_type_string` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Used as `FileProperty._json_type_string` | |
{# Used as `FileProperty._json_type_string` #} |
No reason to have that comment make it to generated code I think
I believe the failing test will be fixed by https://github.com/triaxtec/openapi-python-client/pull/335/files#diff-23d4c94f056bacffca42533cd744b9df1851d32b1947472bc1683f52cf977da6 (🤯 ) |
if len(inner_types) == 1: | ||
return inner_types[0] | ||
else: | ||
return f"Union[{', '.join(inner_types)}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return f"Union[{', '.join(inner_types)}]" | |
return f"Union[{', '.join(sorted(inner_types))}]" |
I think this will fix the determinism issue? While also maintaining the performance and reduced code of using sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I think the only required change is to cover the rest of the union type functions with unit tests.
return inner_types.pop() | ||
else: | ||
return f"Union[{', '.join(sorted(inner_types))}]" | ||
|
||
def get_base_type_string(self) -> str: | ||
return self._get_type_string_from_inner_type_strings(self._get_inner_type_strings(json=False)) | ||
|
||
def get_base_json_type_string(self) -> str: | ||
return self._get_type_string_from_inner_type_strings(self._get_inner_type_strings(json=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing code coverage on a couple of these lines
sub_prop, schemas = property_from_data( | ||
name=name, required=required, data=sub_prop_data, schemas=schemas, parent_name=parent_name | ||
name=f"{name}_item{i}", required=required, data=sub_prop_data, schemas=schemas, parent_name=parent_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally "item" makes me think of a collection (e.g. list) and I think "type" might be more clear? But that's very subjective so up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should probably set required=True
here. I think that would solve our extra Unset
checks in the generated code. Really either the whole Union
is required / nullable or None of it is. Not up to the individual types I don't think.
Would changing this cause any other problems you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable, but is it OK if we punt changing required=True
to a future PR, to keep the scope of this PR limited, unless it's necessary here?
@@ -8,6 +8,8 @@ | |||
{{ construct_template(construct_function, property, source, initial_value=initial_value) }} | |||
{% endmacro %} | |||
|
|||
{% macro check_type_for_construct(property, source) %}isinstance({{ source }}, dict){% endmacro %} | |||
|
|||
{% macro transform(property, source, destination, declare_type=True) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need a better name for this macro than "transform" at some point 😅even I can't remember what this one is for sometimes. Not a note on this PR, just a note on how I need to put some design thought into the templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, yeah...maybe serialize
and deserialize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or to_json
and from_json
...
{% if "None" in property.get_type_strings_in_union(json=True) %} | ||
if data is None: | ||
return data | ||
{% endif %} | ||
{% if "Unset" in property.get_type_strings_in_union(json=True) %} | ||
if isinstance(data, Unset): | ||
return data | ||
{% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use property.required
and property.nullable
for this? Maybe that's not robust enough, not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that depends on the unrolling unions thing you had mentioned below, since a nested property could still be nullable. And technically I guess you could have a none-type in the union.
{% from "property_templates/" + inner_property.template import construct %} | ||
{{ inner_property.python_name }}: {{ inner_property.get_type_string() }} | ||
{% from "property_templates/" + inner_property.template import construct, check_type_for_construct %} | ||
if not {{ check_type_for_construct(inner_property, "data") }}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could start doing:
if {{ check_type_for_construct(inner_property, "data") }}:
{{ construct(inner_property, "data", initial_value="UNSET") }}
return {{ inner_property.python_name }}
and avoid the try/except stuff, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would work out-of-the-box, because we're also catching non-type errors, like parsing dates (isoparse("foo")
). Agreed that ideally I think it would be more robust for us eventually move when possible away from the try/except
(especially with the generic exception catching).
@@ -24,20 +36,25 @@ def _parse_{{ property.python_name }}(data: Any) -> {{ property.get_type_string( | |||
{{ property.python_name }} = _parse_{{ property.python_name }}({{ source }}) | |||
{% endmacro %} | |||
|
|||
{# For now we assume there will be no unions of unions #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we should probably be unrolling unions while building the properties. I don't imagine that would be too difficult, just a check on each inner_property to see if it's a UnionProperty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, that makes sense.
Hi @dbanty, sorry for the delay, just got back from vacation. I addressed two of your comments, and the rest I was wondering if we can leave to be follow-ups in order to push this through? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! Sorry this one took so long to get through
Codecov seems to be having some issues reporting final coverage. Can you pull in main to make the checks re-run? |
@dbanty Merged in main and added another unit test, looks like checks pass now! |
Deserialization of unions in
from_dict
was not working properly, because it failed to account for cases where the union property isNone
orUnset
. This PR adds explicit checks forNone
orUnset
.In addition, the type of the value to deserialize was previously set to
Any
, which was masking type errors. In order to fix these type errors, I introduced a new concept ofcheck_type_for_construct
for each property, which allows the type to be checked before attempting deserialization.In the future, I think we should refine the
try/except
pattern; it's not good practice to rely on a catch-allexcept
when trying to deserialize each type within the union, because we don't know what exception was actually raised.As part of this PR, I also refactored the
type_string
method to reuse logic across properties. Also as part of this PR, I had include #335, which had a circular dependency on this PR:Previously, if you had a
oneOf
of inlined models, each inline model was generated with the same name, causing effectively only the last model to be included (since it overwrote all the previous models).In this PR, we give it an autogenerated name, to avoid the name clash.